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Abstract — Many social networks, e.g., Slashdot and Twitter, 
can be represented as directed graphs (digraphs) witli two types 
of links between entities: mutual (bi-directional) and one-way 
(uni-directional) connections. Social science theories reveal that 
mutual connections are more stable than one-way connections, 
and one-way connections exhibit various tendencies to become 
mutual connections. It is therefore important to take such 
tendencies into account when performing clustering of social 
networks with both mutual and one-way connections. 

In this paper, we utilize the dyadic methods to analyze social 
networks, and develop a generalized mutuality tendency theory 
to capture the tendencies of those node pairs which tend to 
establish mutual connections more frequently than those occur 
by chance. Using these results, we develop a mutuality-tendency- 
aware spectral clustering algorithm to identify more stable 
clusters by maximizing the within-cluster mutuality tendency 
and minimizing the cross-cluster mutuality tendency. Extensive 
simulation results on synthetic datasets as well as real online 
social network datasets such as Slashdot, demonstrate that our 
proposed mutuality-tendency-aware spectral clustering algorithm 
extracts more stable social community structures than traditional 
spectral clustering methods. 



I. Introduction 

Graph models are widely utilized to represent relations 
among entities in social networks. Especially, many online 
social networks, e.g., Slashdot and Twitter, where the users' 
social relationships are represented as directed edges in di- 
rected graphs (or in short, digraphs). Entity connections in 
a digraph can be categorized into two types, namely, bi- 
directional links (mutual connections) and uni-directional links 
[one-way connections). Social theories [28] and online social 
network analysis [2], [7], [28] have revealed that various 
types of connections exhibit different stabilities, where mutual 
connections are more stable than one-way connections. In 
other words, mutual connections are the source of social 
cohesion [3], [4] that, if two individuals mutually attend to 
one another, then the bond is reinforced in each direction. 

Studying the social network structure and properties of 
social ties have been an active area of research. Clustering and 
identifying social structures in social networks is an especially 
important problem [9], [17], [24] that has wide applications, 
for instance, community detection and friend recommendation 
in social networks. Existing clustering methods [21], [29] 
are originally developed for undirected graphs, based on the 
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classical spectral clustering theory. Several recent studies (see, 
e.g., [11], [21], [27], [29]) extend the spectral clustering 
method to digraphs, by first converting the underlying digraphs 
to undirected graphs via some form of symmetrization, and 
then apply spectral clustering to the resulting symmetrized 
(undirected) graphs. However, all these methods have two 
common drawbacks, which prevent them from obtaining stable 
clusters with more mutual connections. First, these methods 
do not explicitly distinguish between mutual and one-way 
connections commonly occurring in many social networks, 
treating them essentially as the same and therefore ignoring 
the different social relations and interpretations these two types 
of connections represent (see Section 11 for more in-depth 
discussion). Second, by simply minimizing the total cross- 
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(a) Traditional Spectral Clustering (b) Tendency Aware Spectral Clustering 
Fig. 1. An example network 

cluster links (that are symmetrized in some fashion), these 
methods do not explicitly account for the potential tendencies 
of node pairs to become mutually connected. As a simple 
example. Fig. 1 shows two groups of people in a network, 
where people in the same group tend to have more mutual 
(stable) connections, and people across two groups have more 
one-way (unstable) connections. When using the traditional 
spectral clustering method, as shown in Fig. 1(a), group B 
will be partitioned into two clusters, due to its strict rule of 
minimizing the total number of across cluster edges. On the 
other hand, the correct partition should be done as shown in 
Fig. 1(b), where the majority of mutual (stable) connections 
are placed within clusters, and one-way (unstable) connections 
are placed across clusters. 

In this paper, we propose and develop a stable social cluster 
detection algorithm that takes into account the tendencies of 
node pairs whether to form mutual (thus stable) connections 



or not, which can resuh in more stable cluster structures. To 
tackle this clustering problem, we need to answer the following 
questions: 1) how to track and evaluate the tendencies of 
node pairs to become mutual (stable) relations? and 2) how to 
cluster the entities in social networks by accounting for their 
mutuality tendencies so as to extract more stable clustering 
structures? 

To address these questions, we utilize dyadic methods to 
analyze social networks, and develop a generalized mutuality 
tendency theory which better captures the tendencies of node 
pairs that tend to establish mutual connections more frequently 
than those occur by chance. Using these results, we develop 
a mutuality-tendency-aware spectral clustering algorithm to 
detect more stable clusters by maximizing the within- cluster 
mutuality tendency and minimizing the cross-cluster mutuality 
tendency. Our contributions are summarized as follows. 

• Motivated by the social science mutuality tendency the- 
ory, we establish a new cluster-based mutuality tendency 
theory which yields a symmetrized mutuaUty tendency 
for each node pair, and provides a measure of strength of 
social ties among nodes in a cluster. 

• Based on our theory, we develop a mutuality -tendency - 
aware spectral clustering algorithm that can partition the 
social graphs into stable clusters, by maximizing the 
within-cluster mutuality tendencies and minimizing the 
across cluster mutuality tendencies. 

• The experimental results - based on both social network 
structures of synthetical and real social network datasets 
- confirm that our clustering algorithm is able to generate 
more stable clusters than the traditional spectral clustering 
algorithms. 

To our best knowledge, this is the first work studying the im- 
pact of tendencies of node pairs to become mutual connections 
on the stability of cluster structure of social networks. The 
remainder of the paper is organized as follows. In section II we 
briefly discuss the existing dyadic analysis methods, the tra- 
ditional spectral clustering algorithms and other related work. 
In section III we introduce a cluster-base mutuality tendency 
theory, and based on this theory, we develop a mutuality- 
tendency-aware spectral clustering algorithm in section IV. In 
section V, we evaluate the performances of our method using 
synthetic and real social network (e.g., Slashdot) datasets. We 
conclude the paper in section VI. 

II. Preliminaries, Related Work and Problem 

DEFINITION 

In this section, we first introduce the existing dyadic analysis 
methods in the social theory literature for analyzing and 
characterizing social network mutual connections and one-way 
connections. We then present the classic spectral clustering 
theory which was developed for undirected graphs, and briefly 
survey some related works which apply this theory to digraphs 
through symmetrization. We argue that these existing methods 
for clustering digraphs via symmetrization are inadequate in 
solving social network clustering problems, as they ignore 



different social ties (and mutuality tendencies) represented by 
mutual and one-way connections in social networks. We end 
the section with the problem definition, namely, how to iden- 
tify stable clusters in social networks by taking into account 
mutuality tendencies of mutual and one-way connections. 

A. Dyadic Analysis and Mutuality Tendency 

Given a social network with both uni- and bi-directional 
links, such a network can be represented as a (simple) digraph 
G ~ {V,E) with |y| = n nodes. If the links also have 
weights (say, representing the strength of connections or social 
ties), such a network can be more generally represented as a 
weighted digraph, G = {V, E, A) where Aij represents the 
strength of connection or "affinity" from node i to node j. 
When j4 is a 0-1 matrix, G reduces to a simple digraph, 
and A is the standard adjacency matrix of the digraph, where 
Aij = 1 if the directed edge i -> j is present, and Aij = 
otherwise. In this paper for simplicity we focus primarily on 
simple (unweighted) digraphs with no selfloops, namely, social 
networks with unweighted directional links. Most online social 
networks are of unweighted variety. 

Social scientists commonly view the social network G as a 
collection of dyads [28], where a dyad is an unordered pair 
of nodes and directed edges between two nodes in the pair 
Denote a dyad as Dyij ~ (Aij,Aji), for i < j. Since dyad 
is an unordered notion, we have in total Nd ~ n{n — l)/2 
dyads in G. Hence, there are only three possible isomorphism 
dyads. The first type of dyads is mutual relationship, where 
both directional edges i ^ j and j -> z are present. The 
second type of dyads is one-way relationship, where either 
i —>■ j OT j —)■ i is present, but not both. The last type of 
dyads is null relationship, where no edges show up between i 
and j. 

Interpretations of dyads. Social scientists have observed 
that mutual social relations and one-way relations in social 
networks typically exhibit different stabilities, namely, mutual 
relations are more stable than one-way relations [28]. Hence 
in the social science literature, one prevalent interpretation 
of dyadic relations in social networks are the following: 
mutual dyads are considered as stable connections between 
two nodes and null relation dyads represent no relations; 
the one-way dyads [1], [6], [16], [18], [20] are viewed as 
an intermediate state of relations, which are in transition 
to more stable equilibrium states of reciprocity (mutual or 
no relation). Several recent empirical studies [7], [10] of 
online social networks have further revealed and confirmed 
that mutual social relations are more stable relations than one- 
way connections. 

Computing dyad census. Given a (simple) digraph G = 
{V,E), with n = |y I nodes. Let m, b, and u denote the number 
of mutual, one-way, and null dyads in the network. Clearly, 
m + b + u = n{n — l)/2. The triple (m, b, u) is referred to 
as the dyad census, since it is derived from an examination 
of all (possible) dyads in the network. The dyad census triple 
can be computed in terms of the adjacency matrix A of G as 



follows (in both scalar and matrix forms): 

m = J2A^JAJ^ ^-tr (AA), 

i<j 

b=\E\-2m = tr{AA^) - tr{AA), 

u^ Nd-b-m = Nd- tr{AA^) + tr{AA). 

Measuring mutuality tendency. The notion of mutuality 
tendency has been introduced in the social science literature 
(see, e.g., [8], [28]) to measure the tendency for a node pair to 
establish mutual connections. For any dyad between i and j in 
a digraph G, if i places a link to j, pij represents the tendency 
that j wiU reciprocate to i more frequently than would occur 
by chance. 

Let Xij denote the random variable that represents whether 
or not node i places a directed edge to node j. There are only 
two possible events (i.e., Xij takes two possible values): Xij ~ 
1, representing the edge is present; or Xy = 0, the edge is not 
present. Let Xij (resp. Xij) denote the event {Xij — 1} (resp. 
{Xij ~ 0}). Then the probability of the event Xij occurring 
is P{Xij). The probability that i places a directed edge to j 
and i reciprocates back (i.e., node i and node j are mutually 
connected) is thus given by 



P{Xij,X-ji) 



P{X,,)P{X,,\X,.j), 



(1) 



Wofle [28] introduces the following measure of mutuality 
tendency in terms of the conditional probability P{Xji\Xij) 
as follows: 



P{X,,\X,j) ^ P{X,,) + p,,P{X,,), 
) - P{X,j)P{X,i) 



P^3 



V «i ' J* 



P{X,,)P{X, 



(2) 



where — oo < pij < 1 ensures < P{Xji) + pP{Xji) < 1 to 
hold. Like many indices used in statistics, pij is dimensionless 
and easy to interpret, since it uses and 1 as benchmarks. If 
/9 = 1, the mutuality tendency is maximum, meaning that 
given that node i places a link to node j, node j will for 
sure reciprocate. If pij = (i.e., P{Xji\Xij) = P{Xji)), 
then node j reciprocates and places a link to node i purely 
by chance, namely, it is independent of the event that node i 
places a link to node j. Hence when < pij < 1, it suggests 
more than a chance tendency for node j to reciprocate back. 
Furthermore, if pij < (i.e., P{Xji\Xij) < P{Xji)), there is 
less than chance tendency for node j to reciprocate; in other 
words, it suggests a tendency away from mutual dyads, toward 
one-way and null dyads. Hence, — oo < Pij < 1 provides a 
measure of the strength of tendency for reciprocation. 

From eq.(2), the joint distribution P{Xij,Xji) in eq.(2) 
can be measured by the observed graph, namely, either 
P{X,j,Xj,) = P'^'^\Xij,Xji) = 1, when i and j have 
mutual connection, or P{Xij,Xj,) = P^'^'>{X,j,Xji) = 0, 
otherwise, where the superscript uj indicates that the prob- 
ability is obtained from the observed graph. On the other 
hand, the distribution for each individual edge is measured 
by P{Xij) = P^^'^Xij) = rpTTT' where di is the out-going 



degree of node i. P^'^'{Xij) represents the probability of edge 
i —>■ j being generated under a random graph model, denoted 
by the superscript p, with edges randomly generated while 
preserving the out-degrees. Hence, the tendency pij is obtained 
by implicitly comparing the observed graph with a reference 
random digraph model. 

Limitations of Wolfe's mutuality tendency measure for 
stable social structure clustering. Although the node pair 
in a dyad is unordered (i.e., the two nodes are treated "sym- 
metrically" in terms of dyadic relations), Wolfe's measure of 
mutual tendency is in fact asymmetric. This can be easily seen 
through the following derivation. By definition. 



P{X,,\X,,) ^ P{X,,) 
P{X,,\X,i) = P{X.,j) 



- PjiP{Xji), 



Multiplying the above two equations with P{Xij) and P{Xji) 
respectively and from eq.(l), we have 

p,. _ P{X,,)P{X,j) _ P{X,{)-P{X,,)P{X,,) 
p„- P{X,j)P{X,,) P{X,j) - P{X,j)P{Xj,) 

We see that pij = pji if and only if P{Xij) = P{Xji) holds. 
Hence, given an arbitrary dyad in a social network Wolfe's 
measure of mutuality tendency of the node pair is asymmetric 
- in a sense that it is a node-specific measure of mutuality 
tendency. It does not provide a measure of mutuality tendency 
of the (unordered) node pair viewed together While such 
asymmetric (node-specific) measure of mutuality tendency can 
be useful in some social network analysis, as will be clear later, 
such an asymmetric measure poses difficulty in identifying and 
extracting stable cluster structures in social networks. For in- 
stance, given a partition V = {S, S) of a digraph, generalizing 
Wolfe's measure to clusters, the mutuality tendencies across 
the two clusters, denoted by p{S,S) = J^i^s jeS PiJ ^^^ 
p{S, S) = J2i£S jes Pij' ^^ generally not symmetric, namely, 
p{S,S) ^ p{S,S). In Section III, we will introduce a new 
measure of mutuality tendency that is symmetric and captures 
the tendency of a node pair in a dyadic relation to establish 
mutual connection. This measure of mutuality tendency can be 
applied to clusters and a whole network in a straightforward 
fashion, and leads us to develop a mutuality-tendency-aware 
spectral clustering algorithm. 

B. Spectral Clustering Theory and Extensions to Digraphs via 
Symme triza tion 

Spectral clustering methods (see, e.g., [15], [22], [26], 
[27], [29]) are originally developed for clustering data with 
symmetric relations, namely, data that can be represented as 
undirected graphs, where each relation (edge) between two 
entities, Aij = Aji, represents their similarity. The goal is 
to partition the graph such that entities within each cluster 
are more similar to each other than those across clusters. 
This is done by minimizing the total weight of cross-cluster 
edges (possibly weighted by the total weight of edges within 
clusters). In the following we present the basics of spectral 
clustering theory (see [25] for more details). 



Given the (non-negative) similarity matrix A, the cut func- 
tion is defined to quantatively measure the quahty of a partition 
V = {Si, • • • , Sk), and is defined as follows: 

Cut{Si,Si):^ ^ A,, 

K 

Cut{Si, ■■■ ,Sk):^Y1 '^^^iS,, S^). 

To account for cluster sizes - especially to obtain relatively 
balanced clusters (in terms of sizes), the ratio cut function 
RCut [5] and the normalized cut function NCut [22] have 
also been defined: 



K 



RCut{Si 



Cutis, J,) 



NCutiSi,--- ,Sk):=Y.^''*''^"^'^ 



i=i 



Vol{S, 



where vol{Si) = J^jes- '^i ^^ '■^^ volume of the cluster Si. 

In the following (and the remainder of the paper), we will 
use the ratio cut function as the objective function. All the 
results also hold true for the normalized cut. Using the ratio 
cut, the clustering problem formulated as a graph mincut 
optimization problem can be rewritten in the following form: 



mill RCut{Si, 

Si,--- ,5fc 



,Sk)^ 



(3) 



The (unnormalized) Laplacian matrix L = D — A \s used 
to solve the above mincut problem, where D = diag[di\ 
with di = J2j ^ij is the diagonal degree matrix. Given 
a (nonnegative) symmetric A, L is symmetric and positive 
semi-definite. If we take K eigenvectors corresponding to 
the smallest eigenvalues of L, the optimal solution to the 
problem eq.(3), namely, the optimal partition into K clusters, 
can be well approximated by applying the K-means algorithm 
to clustering the data points projected to the subspace formed 
by these K eigenvectors [25]. Moreover, [13] provides a 
systematic study on comparing a wide range of undirected 
graph based clustering algorithms using real large datasets, 
which gives a nice guideline of how to select clustering 
algorithms based on the underlying networks and the targeting 
objectives. 

Extensions to digraphs via symmetrization. When relations 
between entities are asymmetric, or the underlying graph is 
directed, spectral clustering cannot be directly applied, as the 
notion of (semi-)definiteness is only defined for symmetric 
matrices. Several recent studies (see, e.g., [11], [21], [27], 
[29]) all attempt to circumvent this difficulty by first converting 
the underlying digraphs to undirected graphs via some form 
of symmetrization, and then apply spectral clustering to the 
resulting symmetrized (undirected) graphs. For example, the 
authors in [21] discuss several symmetrization methods, 
including the symmetrized adjacency matrix A = {A+A'^)/2, 
the bibliographic coupling matrix AA^ and the co-citation 
strength matrix A'^A, and so forth. Symmetrization can also 



be done through a random walk on the underlying graph, 
where P = D~^A is the probability transition matrix and 
D = diag[d°'^*] is a diagonal matrix of node out-degrees. For 
example, taking the objective function as the random walk flow 
circulation matrix F-^ = HP, where 11 is the diagonal station- 
ary distribution matrix, we have the symmetrized Laplacian of 
the circulation matrix as 



^ _ £j-£^ _ ^ _ nipir 



U-ip^ui 



where C is the (asymmetric) digraph Laplacian matrix [14]. 
Then the classical spectral clustering algorithm can then be 
applied using C which is symmetric and semi-definite. Zhou 
and et al [27], [29] use this type of symmetrization to perform 
clustering on digraphs. Moreover, Leicht and Newman [11] 
propose the digraph modularity matrix Q — [Qij], which 
captures the difference between the observed digraph and the 
hypothetical random graph with edges randomly generated 
by preserving the in- and out-degrees of nodes, namely, 
Qij = Aij — d°^*d''p/m. Then, if the sum of edge modularities 
in a cluster S is large, nodes in S are well connected, since 
the edges in 5* tend to appear with higher probabilities than 
occur by chance. However, Q by definition is asymmetric, 
where [11] uses the symmetrized Q — {Q+Q'^)/2 as objective 
to perform spectral clustering method. Essentially, the edge 
modularity captures how an individual edge appears more 
frequently than that happens by chance, thus the modularity 
based clustering method tends to group those nodes with 
more connections than expected together, which like all other 
clustering methods presented above completely ignores the 
distinction between mutual and one-way connections. 
Problem definition: Clustering and identifying stable clus- 
ters in social networks with mutual and one-way con- 
nections. As discussed earlier, one-way and mutual dyadic 
connections in social networks often represent different states 
or types of social ties and exhibit various stabilities over 
time. Hence when performing clustering to extract community 
structures in social networks, one-way and mutual connec- 
tions should be distinguished and treated differently. Existing 
digraph clustering methods via symmetrization, e.g., those 
mentioned above, on the other hand, ignore these different 
types of connections and treat them as the same: the process 
of symmetrization essentially weighs one-way connections 
as a fraction of mutual connections, and then attempt to 
minimize the total weight of the (symmetrized) cross-cluster 
links. Moreover, different from Leicht and Newman's [11] 
reference random graph model, as presented in earlier section, 
the mutuality tendency compares the observed the digraph with 
a random graph model where edges are randomly generated 
by preserving only the out-degrees, which better reflects the 
underlying model of how social network users establish social 
ties. 

In this paper we want to solve the following clustering 
problem in social networks with bi- and uni-directional links: 
Given a directed (social) graph where mutual connections 
represent more stable relations and one-way connections rep- 



resent intermediate transferring states, how can we account for 
mutual tendencies of dyadic relations and cluster the entities 
in such a way that nodes within each cluster have maximized 
mutuality tendencies to establish mutual connections, while 
across clusters, nodes have minimized tendencies to establish 
mutual connections? The clusters (representing social struc- 
tures or communities) identified and extracted thereof will 
hence likely be more stable. 

III. Cluster-based Mutuality Tendency Theory 

Inspired by Wolfe's study in [28], we propose a new mea- 
sure of mutuality tendency for dyads that can be generalized to 
groups of nodes (clusters), and develop a mutuality tendency 
theory for characterizing the strength of social ties within a 
cluster (network structure) as well as across clusters in an 
asymmetric social graph. This theory lays the theoretical foun- 
dation for the network structure classification and community 
detection algorithms we will develop in section IV. 

A. Cluster based mutuality tendency 

Let Xij denote the random variable that represents whether 
or not node i places a directed edge to node j. There are 
only two possible events (i.e., X^ takes two possible values): 
Xy = 1, representing the edge is present; or X^j = 0, the edge 
is not present. Let Xij (resp. Xij) denote the event {Xy = 1} 
(resp. {Xy = 0}). Given an observed (asymmetric) social 
graph G, to capture the mutuality tendency of dyads in this 
graph, we compare it with a hypothetical, random (social) 
graph, denoted as G'^^\ where links (dyadic relations) are 
generated randomly (i.e., by chance) in such a manner that 
the (out-)degree di of each node i in G'^' is the same as 
that in the observed social graph G. Under this random social 
graph model, the probability of the event Xij occurring is 
P^f^^{Xij) ~ y,^_ ^ ; namely, i places a (directed) link to 
node i randomly or by chance (the superscript i^l indicates the 
probability distribution of link generations under the random 
social graph model). The probability that i places a directed 
edge to j and j reciprocates back (i.e., node i and node j 
are mutually connected) is thus given by P^^^\Xij,Xji) ~ 
P(^)(X,,)P('')(X,,|X,,) ^ P(^)(X„)P('')(X,,), since X„- 
and Xjj are independent under the random social graph model. 
On the observed social graph, denote P'"' {Xij , Xji) to repre- 
sent the event whether there is a mutual connection (symmetric 
link) between node i and node j, i.e., P^^\Xij,Xji) = 1, if 
the dyad Dyij is a mutual dyad in the observed social graph, 
and P^'^\Xij^Xji) = 0, otherwise. We define the mutuality 
tendency of dyad Dyij as follows: 

= P^^HX,,,X,,) - P(^)(X,,)P(^)(X,,), (4) 

which captures how the node pair i and j establish a mutual 
dyad more frequently than would occur by chance. 

This definition of mutuality tendency is a symmetric mea- 
sure for dyad Dyij, i.e., 9ij ~ 9ji. In addition, it is shown that 



e f— 1, 11. We remark that 



indicates that if node i 



reciprocate back to node i is no more Ukely than would occur 
by chance; the same holds true if node j places a directed link 
to node i instead. On the other hand, dij > indicates that if 
node i (resp. node j) places a directed link to node j (resp. 
node i), node j (resp. node i) will more likely than by chance 
to reciprocate. In particular, with dij = 1, node j (resp. node 
i) will almost surely reciprocate. In contrast, 9ij < indicates 
that if node i (resp. node j) places a directed link to node j 
(resp. node i), node j (resp. node i) will tend not to reciprocate 
back to node i (resp. node j). In particular, with 9ij = — 1, 
node j (resp. node i) will almost surely not reciprocate back. 
Hence 9ij provides a measure of strength of social ties between 
node i and j: Oij > suggests that the dyadic relation between 
node i and j is stronger, having a higher tendency (than by 
chance) to become mutual; whereas dij < suggests that node 
i and j have weaker social ties, and their dyadic relation is 
likely to remain asymmetric or eventually disappear 
Mutuality tendency of clusters. The mutuahty tendency 
measure for dyads defined in eq.(4) can be easily generalized 
for an arbitrary cluster (a subgraph) in an observed social 
graph, 5 C G. We define the mutuality tendency of a cluster 
S, 85, as follows: 



= Y^ P(-)(X,;„X,,)- Y P^'\X,,)P^'^\X,,) 



i^j;i,j&s 



i^j;i,j&s 



(5) 



where the subscript i ^ j ■ i,j G S means that the summation 
accounts for all (unordered) dyads, and i,j are both in S. 
Denote the second term in eq.(5) as rrig, and the (out-degree) 
volume of the cluster S as ds := X^ies'^i- ^^ ^'''H^ij) = 
d.,/{\V\ - I) and pi'^HX,,) ^ d,/{\V\ - 1), 



M 



E 



didj 



d| 



E 



les' 



i~r,i,j^s 



{\V\~IY 2{\V\-lf 



(6) 



which represents the expected number of mutual connections 
among nodes in S under the random social graph model. 
Given the cluster S in the observed social graph G, define 

™s"' '■= J2tr^j:t,jes P^'^^ i^'-J ' ^i'-)^ namely, m^"^ represents 
the number of (observed) mutual connections among nodes 
in the cluster S in the observed social graph G. The mutual 
tendency of cluster S defined in eq.(5) is therefore exactly 



(") 



M 



places a directed link to node j, the tendency that node j will 



s ^m). 

Hence 85 provides a measure of strength of (likely mutual) 
social ties among nodes in a cluster: 85 > suggests that 
there are more mutual connections among nodes in 5* than 
would occur by chance; whereas 85 < suggests that there 
are fewer mutual connections among nodes in 5* than would 
occur by chance. Using 85, we can therefore quantify and 
detect clusters of nodes (network structures or communities) 
that have strong social ties. 

In particular, when S = G, 8g characterizes the mutuality 



tendency for the entire digraph G, i.e., 



Gg = m*"^ - m^^^ - ^ 



'G 



i~3 



(7) 



where mg-^ := X^i^i ^'■'^H^yj^ji) represents the number 
of (observed) mutual dyads among nodes in the observed 
social graph G, and 



'■G - / . 



d'-E.^vd^ 



'~j 



(|y|-i)2 2(|y|-i)2 



(8) 



represents the expected number of mutual dyads among nodes 
in G under the random social graph model. Likewise, given a 
bipartition (5, S) of G, we define the cross-cluster mutuality 
tendency as 






Denote the second quantity in eq.(9) as rrig 



(A') 






^ -i\v\ 



docfi 



^d-s 



1)2 i\V\-l)' 



(9) 



(10) 



which represents the expected number of mutual connections 
among nodes across 5* and S under the random social graph 



model. Define m 



dS 



E 



ies^iGS 



P^'^\Xij,Xji) represent- 



ing the number of (observed) mutual connections among nodes 
across clusters S and S in the observed social graph G. The 
mutuality tendency across cluster 5 and 5 defined in eq.(9) is 



as 



'-as- 



therefore exactly Qgs ~ °m. 

The mutuality tendency theory outlined above accounts 
for different interpretations and roles mutual and one-way 
connections represent and play in asymmetric social graphs, 
with the emphasis in particular on the importance of mutual 
connections in forming and developing stable social struc- 
tures/communities with strong social ties. In the next section, 
we will show how we can apply this mutuality tendency 
theory for detecting and clustering stable network structures 
and communities in asymmetric social graphs. 

IV. Mutuality-tendency-aware spectral 

CLUSTERING ALGORITHM 

In this section, we first consider the simpler case of 
mutuality-tendency-aware clustering problem with K = 2 and 
establish the basic theory and algorithm. We then extend it to 
the general case with K > 2. 

A. Mutuality-tendency-aware spectral clustering: K=2 

Without loss of generality, we consider only simple (un- 
weighted) digraphs G = (V, E) (i.e., the adjacency matrix 
A is a 0-1 matrix). Define the mutual connection matrix 
A/ := min(j4, A^), which expresses all the mutual con- 
nections with unit weight 1. In other words, if node i and 
node j are mutually connected (with bidirectional links). 



M,. 



M. 



1, otherwise. Mi 



AL. 



0. Hence, we 



have Alij = P^"^(Xy , Xj^), representing the event whether 



there is a mutual connection (symmetric link) between node 
i and node j, i.e., in the dyad Dyij in the observed social 
graph. In addition, let 5ij be the Kronecker delta symbol, i.e., 
5ij ~ 1 if i ~ j, and Sij = otherwise. Then, we define 
matrix 



M = 



dd"^ — diag[(P 



with d as the out-going degree vector, where each entry 



M,. 



did J - 6ijdf 



didj 
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ifi^i 
if i^j 



(11) 



represents the probability that two nodes i and j independently 
place two unidirectional links to each other to form a mutual 
dyad. Hence, M^j = p(^)(Xy)p(^'(Xjj) represents the prob- 
ability of node pair i and j to establish a mutual connection 
under random graph model with edges randomly generated by 
preserving the node out-degrees. We denote T = M — M as 
the mutuality tendency matrix, with each entry 

T,, = P(-)(X,„X,.) - P^^\X,,)P^^HX,,) =. % (12) 

as the individual dyad mutuality tendency. 
Mutuality Tendency Lapacian. T is symmetric and those en- 
tries associated with non-mutual dyads are negative, represent- 
ing less mutuality tendencies to establish mutual connections 
than those occur by chance. Define the mutuality tendency 
Laplacian matrix as 



Lt ^= Dt ^ T 



(13) 



where Dt = diag[dT{i)\ is the diagonal degree matrix of 
T, with dT{i) = ^jTij. We have the following theorem 
presenting several properties of Lt- 

Theorem 1. The mutuality tendency Laplacian matrix Lt as 
defined in eq.(13) has the following properties 

• Given a column vector x G R'^', the bilinear form 
X Ltx satisfies 



X Ltx 






-^ijX'^i -^ j ) • 



(14) 



• Lt is symmetric and in general indefinite. In addition, 
Lt has one eigenvalue equal to 0, with corresponding 
eigenvector fli 7 = [1, • • • , 1]"^. 

Proof : (1) By expanding the bilinear form x^LtX, 



x^Ltx = ^ Tij{x1 - XiXj) 
i.jev 






J- ij y-^i ■Xj ) 



(2) The symmetry of both M and M in eq.(12) insures the 
symmetry of Lt, thus Lt has all real eigenvalues. However, 
Lt is in general indefinite, because Ty in eq.(14) could be 
either positive or negative. On the other hand, since 1 Lt = 
and LtI ~ hold true, Lt has an eigenvalue equal 
to with corresponding eigenvectors as identity vector 1 = 

[i,---,iF. 



Mutuality tendency ratio cut function. For a digraph G = 
(y, E), and a partition V = {S, S) on G, we define the 
mutuality tendency ratio cut function as follows. 



TRGut{S,S) = Qos[^^+^^ 



(15) 



which represents the overall mutuality tendency across clusters 
balanced by the "sizes" of the clusters. Then, the clustering 
problem is formulated as a minimization problem with K ~ 2 
clusters. (More general cases with \V\ > K > 2 will be 
discussed in the next subsection.) 

mmTRCut{S,S) (16) 

Since Qqs = ©G ^ (©s + ©s) holds true, we have 

TRGutiS, S) = (Og - (Os + ©5)) (^ + ^ 

For a given graph G, the graph mutuality tendency 6g is a 
constant, the minimization problem in eq.(16) is equivalent to 
the following maximization problem: 

max{(e, + ©,-©,)(^ + ^)} (17) 

Hence, minimizing the cross-cluster mutuality tendency is 
equivalent to maximize the within-cluster mutuality tendency. 
Using the results presented in Theorem 1, we prove the 
following theorem which provides the solution to the above 
mutuality tendency optimization problem. 

Theorem 2. Given the tendency Laplacian matrix Lt = 
Dt — T, the signs of the eigenvector of Lt corresponding to 
the smallest non-zero eigenvalue indicate the optimal solution 
(5, S) to the optimization problem eq.(16). 

Proof : Define the column vector fs = [/s(l); • ■ ■ j fsin)]^ 
with respect to a partition S* U 5 = ^ as follows: 



(18) 



f u\ = / V\S\m__ ifieS 
■'^^^ I -VJS\/\S\ if*e5 ■ 
Then, by applying Theorem 1, we have 

fILTfs^Y.T^lifsi^)-fsU)r 

-(M + M+2) V T 

H^l©..(^ + ^). 

In addition, we have /J/5 — ll/slP ~ 1^1- Hence, Rayleigh- 
quoient for Lx is 



(19) 



fs LtIs 
fUs 



TRCut{S,S) >X{Lt), 



where \{Lt) is the smallest non-zero eigenvalue of Lt- 
Here \{Lt) cannot be 0, because we have the constraint 



/s _L 1. From Theorem 1, 1 is an eigenvector associated with 
eigenvalue 0. Hence, the problem of minimizing eq.(16) can 
be equivalently rewritten as 



mm fgLTfs, s.t.: fsl.1 in form of eq. (18), ||/s| 



1^1- 



Since the entries of the solution vector fs are only allowed to 
take values in form of eq.(18), this is a discrete optimization 
problem, which is known to be NP hard [25]. By relaxing 
the discreteness condition and allowing fsii) to take arbitrary 
values in R, we have the following relaxed optimization 
problem. 



.™iji„ fs^rfs, 



/s6 



S.t: fs±l, and Wfsf = 1^1- 



The solution to this problem, i.e., the vector fs, is the 
eigenvector corresponding to the smallest non-zero eigen- 
value X{Lt)- Hence, we can approximate the minimizer of 
TRCut{S, S) using the eigenvector corresponding to X{Lt)- 
To obtain a partition of the graph, we need to convert the 
real-valued solution vector fs of the relaxed problem to an 
indicator vector One way to do this [25] is to use the signs 
of fs as indicator function, where node Vi G S, if /s(*) > 0, 
and Vi E S, otherwise. ■ 

B. Mutuality-tendency-aware spectral clustering: K > 2 

For the case of finding K > 2 clusters 5*1 U • • • U Sk = V, 
we define the indicator vectors hk = {hik, ■ ■ ■ , hnk), 







if Vi e Sk 

otherwise 



(20) 



where i = I,--- ,n and k — I,-- ,K. Let H denote 
the indicator matrix containing those K indicator vectors as 
columns. Observe that H^H = I, hjhk — 1, and 

©as. 



huLThk 



\Sk 



Define the mutuality tendency ratio cut TRCut{Si, 
for K > 2 clusters as follows: 



TRCut{Si 



k=l 







dSk 



\Sk\ ' 



where the ratio cut reduces to eq.(15) when K = 
problem of minimizing TRCut can be formulated as 



(21) 
■ .Sk) 

(22) 
2. The 



niin TRCut{Si, 

Si,--- ,Sk 



,Sk) 



mill tr{H^LTH) 

Si,--- ,Sk 



S.t.: H H = /, where H is defined in eq. (21). 

One way of solving this problem is utilizing the method 
used in [25] by relaxing the discreteness condition to have 
a standard trace minimization problem as 



min tr(H'^LTH), 

ffgRlVlxK 



S.t: H^H^I 



The optimal solution H contains the first K eigenvectors 
of Lt as columns. The clusters can be then obtained by 
applying the K-means algorithm on those K eigenvectors. 



The solution obtained minimizes the mutuality tendency across 
clusters (which is equivalent to maximizing the within-cluster 
mutuality tendency). 

Choice of K. We choose K, i.e., the total number of clusters, 
using the eigengap heuristic [25]. Theorem 1 shows that Lt 
has all real eigenvalues. Denote the eigenvalues of Lt in 
an increasing order, i.e., Ai < • • • < A„, The index of the 
largest eigengap, namely, K := argmax2<j^<„(.g(A')), where 
g{K) = Xk — Xk-i, K ~ 2,--- ,n, indicates how many 
clusters there are in the network. 

V. Evaluations 

In this section, we evaluate the performance of the 
mutuality-tendency-aware spectral clustering method by com- 
paring it with various symmetrization methods based digraph 
spectral clustering algorithms. We only present the comparison 
results for the adjacency matrix symmetrization method, with 
objective matrix as ^ = {A + A'^)/2. For other settings, 
we obtained similar results and omit them here. We will 1) 
first test the performances using synthetic datasets, and then 
2) apply our method to real online network datasets, e.g., 
Slashdot social network, and discover stable clusters with 
respect to mutual and one-way connections. 

A. Synthetic datasets 

We first consider synthetic datasets designed specifically to 
test the performance of our mutuality-tendency-aware spectral 
clustering method. We randomly generate a network, with 
1000 nodes. There are 38000 directed edges (around 3.8% 
of all directed node pairs') in total, among which one third of 
them around 12666 edges are bidirectional, and two third of 
them around 25334 edges are unidirectional. Those nodes fall 
into 2 clusters, with 600 and 400 nodes respectively, where 
around 93.5% of the bidirectional edges are randomly placed 
within clusters, and around 80.8% of the unidirectional edges 
are randomly placed across clusters. 

We show in Fig. 2(a)(i)-Fig. 2(a)(iii) that the traditional 
spectral clustering algorithm with A = {A + A'^)/2 as 
the objective results in clusters with 180 and 820 nodes 
respectively, which does not reflect the underlying struc- 
ture (See Fig. 2(a)(i)-Fig. 2(a)(iii), because it clusters nodes 
without considering the stability difference between mutual 
connections and one-way connections. On the other hand, our 
proposed mutuality-tendency-aware spectral clustering method 
can cluster the nodes into groups with exactly 600 and 400 
nodes (See Fig. 2(b)(i)-Fig. 2(b)(iii)), which clearly group 
nodes with more mutual (stable) connections together and 
separate nodes connected via one-way connections. 

Furthermore, given the cluster mutuality tendency O5, we 
denote the average mutuality tendency of S* as 6*5 = Qs/Ns, 
with Ns = \S\{\S\ -~ l)/2 as the total number of dyads in S. 

'As we observed from real social networks, e.g., Slashdot.com [23], an 
online commenting network dataset, which will be discussed in the next 
section, the spai'sity of its "core" network is around 0.19%. Here, we choose 
3.8%, that is 20 times large of the real network sparsity, just for the ease of 
visualization of the clustering structure. 



TABLE 1 
Ave. mutuality tendency comparison on synthetic dataset 





Og 


es 


Os 


eas 


Tendency aware clustering 
Traditional clustering 


0.0112 
0.0112 


0.0172 
0.0115 


0.0314 
0.0202 


8.25e-5 
0.0096 



Similarly, we have the average mutuality tendency of G, S, and 
dS as Og = Sa/Na, Og = Os/Ns, and 693 = eas/i\S\\S\), 
respectively. Table I shows the average mutuality tendencies 
of the cluster results obtained by two methods, where we 
can see that the mutuality-tendency-aware spectral clustering 
algorithm can group nodes together with higher within-cluster 
tendencies than that by traditional spectral clustering. On 
the other hand, the cross-cluster tendency obtained using our 
method is very close to 0, which means that the dyads across 
the clusters establish the mutual connections without any 
tendency (or purely independently). In addition, we generated 
synthetic dataset with K > 2 clusters, and similar results are 
obtained shown in Fig. 3. 

B. Real Social Networks 

In the second set of simulations, we applied our mutuality- 
tendency-aware spectral clustering algorithm to several real 
social network datasets, e.g., Slashdot [23], Epinions [19], 
and email communication network [12] datasets, and compare 
with various symmetrization methods based digraph clustering 
algorithms, such as A = {A + A^)/2, AA^ and F.^ = UP. 
Here we only show the comparison results with adjacency 
matrix symmetrization based digraph spectral clustering on 
Slashdot dataset. All other settings lead to similar results and 
we omit them here for brevity. 

Slashdot is a technology-related news website founded in 
1997. Users can submit stories and it allows other users to 
comment on them. In 2002, Slashdot introduced the Slashdot 
Zoo feature which allows users to tag each other as friends or 
foes. The network data we used is the Slashdot social relation 
network, where a directed edge from i to j indicates an interest 
from i to j's stories (or topics). Hence, two people with mutual 
connections thus share some common interests, while one-way 
connections infer that one is interested in the other's posts, but 
the interests are not reciprocated back. The Slashdot social 
network data was collected and released by Leskovec [23] in 
November 2008. 

TABLE II 
Statistics of Slashdot social network Dataset 



Nodes 


77360 


Edges 


828161 


Unidirectional edges 


110199 


Bidirectional edges 


717962 


Nodes in largest SCC 


70355 


Edges in largest SCC 


818310 


Unidirectional edges in largest SCC 


100930 


Bidirectional edges in largest SCC 


717380 


Nodes in the "core" component 


10131 


Edges in the "core" component 


197378 


Unidirectional edges in the "core" component 


21404 


Bidirectional edges in the "core" component 


175974 
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Fig. 2. Simulation results on synthetic dataset witli K = 2 clusters. Fig.2(a)(i)-Fig.2(a)(iii) show the clusters detected by traditional spectral clustering 
algorithm, and Fig.2(b)(i)-Fig.2(b)(iii) show the clusters extracted using our mutuality-tendency-aware spectral clustering algorithm. 
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Fig. 3. This synthetic dataset is generated in X = 3 clusters, with 500, 400 and 300 nodes, respectively. There are 54675 directional edges, among which 
27336 edges are bidirectional and 27339 edges are unidirectional. We are randomly placed 90.02% of the bidirectional edges in clusters, and 89.6% of the 
unidirectional edges across clusters. Fig.3(a)(i)-Fig.3(a)(iii) show that traditional spectral clustering algorithm detects clusters with 661, 538 and 1 entities, 
respectively, while our method identify correct clusters (See Fig.3(b)(i)-Fig.3(b){iii)). 
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Fig. 4. Simulation results on Slashdot social network dataset. Fig.4(a)(i)-Fig.4(a)(iii) show the clusters detected by traditional spectral clustering algorithm, 
and Fig.4(b)(i)-Fig.4(b)(iii) show the clusters extracted using our mutuality-tendency-aware spectral clustering algorithm. 



The statistics^ are shown in Table II. It shows that the largest 
strongly connected component (SCC) include about 70355 
nodes. Then, we remove those nodes with very low in-degrees 
and out-degrees, say no more than or equal to 2. By finding the 
largest strongly connected component of the remaining graph, 
we extract a "core" of the network with 10131 nodes and 
197378 edges, among which there are 21404 unidirectional 
edges and 175974 bidirectional edges, respectively. 

In our evaluations, we observe that there is a large "core" 
of the network, and all other users are attached to this core 
network. In our study, we are interested in extracting the 
community structure from the "core" network. 

When applying our spectral clustering algorithm to the 
"core" network, two clusters with 8892 and 1239 nodes are 
detected (shown in Fig.4(b)(i)-Fig.4(b)(iii)). In our result, a 
large portion (about 35.04%) of cross-cluster edges are unidi- 

^Here, the total number of edges is smaller than that is shown on the 
website [23], because we do not count for those selfloops. 



rectional edges which in turn yield lower mutuality tendency 
across clusters. On the other hand, when using the traditional 
symmetrized A == {A+A^)/2, two clusters with 9640 and 491 
nodes are extracted instead (shown in Fig.4(a)(i)-Fig.4(a)(iii)). 
We can see that the clustering result obtained using the 
traditional spectral clustering method has only around 5.75% 
of the total edges across clusters as unidirectional edges, which 
boost up the mutuality tendency across clusters. However, in 
our clustering result, we have more unidirectional edges placed 
across clusters, which decreases the mutuality tendency across 
clusters. From Fig. 4(b)(i), we can clearly see that we have 
unidirectional (red) edges dominating the cross-cluster parts. 



Table III shows the average mutuality tendency comparison 
between different clustering methods, where we can see that 
the mutuality-tendency-aware spectral clustering algorithm can 
group nodes together with higher within-cluster tendencies 
than that of traditional spectral clustering. 



TABLE III 

Ave. mutuality tendency comparison on Slashdot dataset 





0G 


0S1 


082 


Gas 


Tendency aware clustering 
Traditional clusteiing 


0.0017 
0.0017 


0.0049 
0.0018 


0.0028 
0.0021 


0.00033 
0.00070 



VI. Conclusion 

In this paper, we establish a generalized mutuality tendency 
theory to capture the tendencies of clustered node pairs to 
establish mutual connections more frequently than those occur 
by chance. Based on our mutuality tendency theory, we 
develop a mutuality-tendency-aware spectral clustering algo- 
rithm that can detect stable clusters, by maximizing the within- 
cluster mutuality tendency and minimizing the cross-cluster 
mutual tendency. Extensive simulation results on synthetic, 
and real online social network datasets, such as Slashdot, 
demonstrate that our proposed mutuality-tendency-aware spec- 
tral clustering method resolves more stable social community 
structures than traditional spectral clustering methods. 
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